Statistical Modelling of Speech Segment Duration by Constrained Tree Regression
نویسندگان
چکیده
This paper presents a new method for statistical modelling of prosody control in speech synthesis. The proposed method, which is referred to as Constrained Tree Regression (CTR), can make suitable representation of complex effects of control factors for prosody with a moderate amount of learning data. It is based on recursive splits of predictor variable spaces and partial imposition of constraints of linear independence among predictor variables. It incorporates both linear and tree regressions with categorical predictor variables, which have been conventionally used for prosody control, and extends them to more general models. In addition, a hierarchical error function is presented to consider hierarchical structure in prosody control. This new method is applied to modelling of speech segmental duration. Experimental results show that better duration models are obtained by using the proposed regression method compared with linear and tree regressions using the same number of free parameters. It is also shown that the hierarchical structure of phoneme and syllable durations can be represented efficiently using the hierarchical error function. key words: speech segmental duration, statistical modelling, regression
منابع مشابه
An Overview of Prosodic Modelling for Croatian Speech Synthesis
In order to include prosody into the text to speech (TTS) systems prosody knowledge needs to be acquired, represented and incorporated. Two main features of prosody important for modelling prosody for TTS systems are duration and F0 contour. There are various approaches to modelling those features and they can be categorized into three main groups: rule based, statistical and minimalistic. Some...
متن کاملBayesian Modelling Of Vowel Segment Duration For Text-to-Speech Synthesis Using Distinctive Features
We apply a Bayesian belief network (BN) approach to vowel duration modelling, whereby vowel segment duration is modelled as a hybrid Bayesian network consisting of discrete and continuous nodes, with the nodes in the network representing linguistic factors that affect segment duration. Factor interaction is modelled in a concise way by causal relationships among the nodes in a directed acyclic ...
متن کاملMIMIC : a voice-adaptive phonetic-tree speech synthesiser
This paper presents Mimic : a decision-tree based concatenative voice adaptive text to speech synthesiser. Mimic integrates text to speech synthesis (TTS) with speech recognition and speaker adaptation. Speech is synthesised from concatenation of triphone synthesis units. The triphone units are obtained from clusters of training examples modelled, labelled and segmented using clustered HMMs and...
متن کاملLearning duration
In this paper, we investigate the possibilities to enhance statistic modelling of segment duration for speech synthesis. In particular we look at the effects of gradually increasing size of training data and at specific problems of phonetic coding. We show that questions arising due to the inherent mismatch between cannon phonemic representation and phonetic realisation are best answered by sta...
متن کاملUsing bayesian belief networks for model duration in text-to-speech systems
The problems of database imbalance and factor interaction make modelling of segment duration in text-to-speech systems a challenging task. We therefore propose a probabilistic Bayesian belief network (BN) approach to tackle data sparsity and factor interaction problems. The belief network approach makes good estimations in cases of missed or incomplete data. Also, it captures factor interaction...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000